fix(groupby): group by non-dimension coordinate names; fast multi-key grouping by names (#750, #753) by FBumann · Pull Request #751 · PyPSA/linopy

FBumann · 2026-06-04T11:41:11Z

What

Brings LinearExpression.groupby to parity with xarray.Dataset.groupby for grouping by coordinate names — on the fast path, with no breaking change. Two related pieces:

1. Group by an attached non-dimension coordinate (closes #750).

expr = (1 * x).assign_coords(period=period)   # 'period' rides on the 'snapshot' dim
expr.groupby("period").sum()   # before: ValueError: period already exists as coordinate
expr.groupby(period).sum()     # before: KeyError: 'period'

Both now work and take the fast path, mirroring xarray. (Grouping by a dimension name always worked — only attached non-dim coordinates were broken.) The old workaround was to detach the coord first (expr.drop_vars("period").groupby(period).sum()).

2. Multi-key grouping by names now takes the fast path (closes #753).

expr.groupby(["period", "season"]).sum()   # before: silently dropped to the slow xarray fallback

It now rides the reindex fast path and returns the same output — one dimension per key, byte-identical to the fallback, sparse fill cells included. The pd.DataFrame grouper is untouched and keeps its compact stacked-MultiIndex output, so this is non-breaking.

How

In LinearExpressionGroupby:

_resolve_group normalizes a key: unwrap a single-element list (groupby(["period"]) → scalar), and resolve a string coord name to its coordinate so it takes the fast path.
sum() drops every coordinate aligned to the grouped dimension before reshaping, so an attached aux coord (including the one being grouped by) no longer collides on the final rename.
The groupby property detaches a free (non-indexed) coordinate before handing it to xarray — fixing the use_fallback=True path — but never a MultiIndex level.
A list of coord names (1-D, sharing one dim) is gathered into a value frame to ride the fast path, then unstacked back into one dimension per key.

Memory

One dimension per key is a dense cartesian grid, so a sparse key crossing materialises mostly-fill cells (measured ~100× vs the compact DataFrame grouper for a diagonal crossing — see #740). A UserWarning nudges sparse/high-cardinality users to the DataFrame grouper; it reads the collapsed MultiIndex levels, so it is O(observed), not O(N), and fires before unstack allocates the grid. Getting separate dims and compact storage would need a sparse/long-format kernel — tracked in #757 (the groupby-densification follow-up) under the sparse-kernel umbrella #756.

Tests

TestGroupbyByAttachedCoordinate asserts grouped vars/coeffs against hard-coded results on a deterministic model: single-key (name/DataArray × use_fallback), 2-D variable, dimension-coordinate-by-name, single-element list, MultiIndex level, and a pytest.raises row pinning that a list of DataArrays is unsupported. TestMultiKeyFastPath covers the multi-key fast path: fast == fallback (list/tuple, including a sparse crossing), separate-dims-not-stacked, sparse-combination-filled, DataFrame-grouper-stays-compact, and the blow-up warning (fires when sparse, silent when dense). Full suite green; ruff + mypy clean.

Context: relation to PyPSA usage and #744

This is a building block for the flat-dimension + auxiliary-level-coord direction discussed in #744. If n.snapshots becomes a flat dimension carrying period / scenario level coordinates (instead of a stacked pd.MultiIndex), aggregating an expression over a level becomes expr.groupby("period").sum() — exactly what this PR makes work, now also for multiple levels at once (groupby(["period", "scenario"])).

PyPSA still carries per-period workarounds citing a broken MultiIndex groupby (pydata/xarray#6836); that upstream bug is now fixed (verified on xarray 2025.9.0), but those comments are about MultiIndex grouping and per-period rolling, orthogonal to this PR.

Closes #750, #753. Drafted with an agent (Claude Code).

🤖 Generated with Claude Code

`LinearExpression.groupby` could not group by an attached non-dimension coordinate. `expr.groupby("period").sum()` raised `ValueError: period already exists as coordinate or variable name`, and passing the coordinate `DataArray` (`groupby(period)`) raised because the fast path dropped only the dimension index, then renamed the group dim onto a name still held by the attached coordinate. Fix both paths: - `sum()` resolves a string group naming an existing coordinate to that coordinate so it takes the fast path, and drops every coordinate aligned to the grouped dimension (index, MultiIndex levels, auxiliary coords) before reshaping, since collapsing the dimension invalidates them all. - The `groupby` property detaches an attached non-dimension coordinate used as the group before handing it to xarray, so xarray does not try to re-expand it when recombining groups (the `use_fallback=True` path). `expr.groupby("period").sum()` now mirrors `xarray.Dataset.groupby`. Closes #750 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Reorganize the #750 coverage into TestGroupbyByAttachedCoordinate, a parametrized matrix that asserts the grouped expression against hard-coded `vars`/`coeffs` literals (on a deterministic 4- and 8-variable model) instead of comparing to a sibling computation that could share the same bug. Covers single-key (name / DataArray x use_fallback), multi-key (list / tuple x use_fallback), an extra auxiliary coord on the grouped dimension, and a 2-D variable that must keep its other dimension. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…tract A single-element key list (`groupby(["period"])`) now groups like the scalar key, matching xarray -- it is unwrapped in both `sum()` and the `groupby` property. Multi-key grouping must be spelled with names (`["period", "season"]`); a list of `DataArray`s is unhashable and raises in xarray itself, so linopy mirrors that (covered by an explicit `pytest.raises` row in the matrix). Also shorten the test class docstring to house style. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The GH #750 detach must only drop *free* (non-indexed) coordinates. The earlier change also dropped a MultiIndex level when grouping by it via `use_fallback=True`, leaving the dimension without an index (`('snapshot',) are not coordinates with an index`). Guard the detach with `group.name not in data.xindexes` so MultiIndex levels are left intact. Grouping by a MultiIndex level now works on both paths (the pydata/xarray 6836 case, fixed upstream). Add a parametrized regression test over both levels and both paths. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- Extract the single-element-list unwrap + coordinate-name resolution shared by `sum()` and the `groupby` property into one `_resolve_group` helper, removing the duplication (and the drift between the two that caused the earlier MultiIndex-level regression). - Drop the now-stale `(GH #750)` references from code comments; the link lives in the release notes. - Add a test for grouping by a dimension coordinate name (the fast-path broadening), and note it in the release note. - Simplify `test_multi_key`: a multi-key group always uses the xarray fallback, so drop the redundant `use_fallback` parametrization. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Grouping by a dimension coordinate or a MultiIndex level by name already worked; only the non-dimension (free) coordinate case was broken. Correct the release note, which had overclaimed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Drop the verbose comment blocks; the rationale lives in the _resolve_group docstring and the regression tests enforce the invariants (e.g. test_multiindex_level guards the xindexes detach guard). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

`groupby(["a","b"]).sum()` previously dropped to the slow xarray fallback. Resolve a list of coordinate names (1-D, same dim) to a value frame so it rides the existing reindex fast path, then unstack the stacked result back into one dimension per key -- byte-identical to the fallback, sparse fill cells included. The DataFrame grouper is untouched and stays compact (stacked MultiIndex over observed combinations only), so this is non-breaking. One dimension per key is a dense cartesian grid, so a sparse key crossing materialises mostly-fill cells. Warn (pointing at the DataFrame grouper) when the grid is much larger than the observed combinations; the check reads the collapsed MultiIndex levels, so it is O(observed) and fires before unstack allocates. See #753; sparse-representation follow-ups tracked against #740. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

FBumann and others added 4 commits June 4, 2026 13:40

docs: make groupby #750 release note concise and user-facing

9146218

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

FBumann marked this pull request as ready for review June 4, 2026 12:13

FBumann and others added 4 commits June 4, 2026 14:23

This was referenced Jun 4, 2026

Make multi-key LinearExpression.groupby([names]) fast and flat (align with xarray's grouper API) #753

Closed

feat(groupby): fast path for multi-key groupby([names]), flat + non-breaking #755

Merged

FBumann changed the title ~~fix(groupby): group by the name of a non-dimension coordinate (#750)~~ fix(groupby): group by non-dimension coordinate names; fast & flat multi-key grouping (#750, #753) Jun 4, 2026

FBumann force-pushed the fix/groupby-coord-name branch from 63e82ef to ae23a3c Compare June 4, 2026 14:48

FBumann changed the base branch from master to feat/groupby-parity June 4, 2026 14:48

FBumann changed the title ~~fix(groupby): group by non-dimension coordinate names; fast & flat multi-key grouping (#750, #753)~~ fix(groupby): group by the name of a non-dimension coordinate (#750) Jun 4, 2026

FBumann changed the base branch from feat/groupby-parity to master June 4, 2026 14:55

FBumann changed the title ~~fix(groupby): group by the name of a non-dimension coordinate (#750)~~ fix(groupby): group by non-dimension coordinate names; fast & flat multi-key grouping (#750, #753) Jun 4, 2026

FBumann force-pushed the fix/groupby-coord-name branch from 63e82ef to de067dc Compare June 4, 2026 15:20

FBumann changed the title ~~fix(groupby): group by non-dimension coordinate names; fast & flat multi-key grouping (#750, #753)~~ fix(groupby): group by non-dimension coordinate names; fast multi-key grouping by names (#750, #753) Jun 4, 2026

FBumann force-pushed the fix/groupby-coord-name branch from de067dc to 3cc774f Compare June 4, 2026 15:23

This was referenced Jun 4, 2026

Multi-key groupby([names]).sum() densifies to a dense cartesian grid (memory) #757

Open

Umbrella: long-format / sparse _term kernel (dense-_term memory cluster) #756

Open

docs: cut release notes for v0.8.0 #759

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(groupby): group by non-dimension coordinate names; fast multi-key grouping by names (#750, #753)#751

fix(groupby): group by non-dimension coordinate names; fast multi-key grouping by names (#750, #753)#751
FBumann wants to merge 9 commits into
masterfrom
fix/groupby-coord-name

FBumann commented Jun 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

FBumann commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

How

Memory

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

FBumann commented Jun 4, 2026 •

edited

Loading